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Y^ ' Given a reference computer, Kolniogorov complexity is a well defined 

'— '■ function on all binary strings. In the standard approach, however, only 

the asymptotic properties of such functions are considered because they 

^ ' do not depend on the reference computer. We argue that this approach 
^J • can be more useful if it is refined to include an important practical 
f"^ , case of simple binary strings. Kolniogorov complexity calculus may be 
0^ I developed for this case if we restrict the class of available reference corn- 
s' ' puters. The interesting problem is to define a class of computers which 
f^ . is restricted in a natural way modeling the real-life situation where only 
^^ I a limited class of computers is physically available to us. We give an 

J-^ ' example of what such a natural restriction might look like mathemat- 

K^ . ically, and show that under such restrictions some error terms, even 

k> , logarithmic in complexity, can disappear from the standard complexity 

^ I calculus. 
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1 Introduction 

The asymptotic nature of Kolmogorov complexity calculus renders it signifi- 
cantly less useful in practical applications such as inference by the minimum 
description length (MDL) principle f^. In the classical MDL approach JlOl this 



m 



problem is solved by replacing Kolmogorov complexity with a phenomenolog- 
ical complexity measure just before performing the actual inference. Such 
a measure can be chosen to suit a particular application, whereas the gen- 
eral form of the MDL constructions can be considered as a consequence of 
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the asymptotic properties of Kolmogorov complexity (consult section 5.5 in 
Ref. 0). Here we propose a different approach. We argue that Kolmogorov 
complexity can become more practical if we restrict the class of reference com- 
puters. 

Computer science is not the only field which can benefit from the proposed 
research. There is a growing interest in using Kolmogorov complexity as a 
fundamental physical concept. This includes applications in thermodynam- 
ics []I], ^ QQ' theory of chaos 0, ^, |^, 0][|, physics of computation (con- 



sult PI and references therein), and many other areas of modern theoretical 
physics [^, |T2|, TB|. It is however very difficult to use Kolmogorov complexity in 



any concrete physical setting, or indeed, in any concrete application. For that 
we need a much more detailed calculus that can be applied to particular cases 
of reference computers. The main aim of this article is to stimulate further 
research in developing such a practical complexity calculus. 

This article is organized as follows. In section ^ we review some basic defini- 
tions. In section ^ we present the main conceptual arguments of the paper. 
In section ^ we give an example of how one can build a restricted class of 
computers in a "natural" way. Considering one of the central equalities of the 
standard complexity calculus we give an illustration of how the error terms 
may be reduced. 



2 Basic definitions 

Let X = {A, 0, 1,00, 01, 10, 11, 000, . . . } be the set of finite binary strings where 
A is the string of length 0. A set of strings Y C X with the property that no 
string in Y is a prefix of another is called an instantaneous code. A prefix 
computer is a partial recursive function C : Y x X — i> X. For each p G Y (pro- 
gram string) and for each (i G X (data string) the output of the computation is 
either undefined or given by C{p, d) G X. Kolmogorov complexity of a string a 
given a data string d relative to a computer C is defined as the length Kc{a\d) 
of the shortest program that makes C compute a given data d: 



Kc{a\d)=m.in{\p\ \C{p,d) = a}, (1) 



p 



where \p\ denotes the length of the program p (in bits). 

Since this complexity measure depends strongly on the reference computer, it 
is important to find an optimal computer U such that the complexity of any 
string relative to U is not much higher that the complexity of the same string 
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relative to any other computer C. Mathematically, a computer U is called 
optimal if 

VC 3kc such that Va, d : Ku{a\d) < Kc{a\d) + kc , (2) 

where kc is a constant depending on C (and U) but not on a or d. It turns 
out that the set of prefix computers contains such a U and, moreover, it can 
be constructed so that any prefix computer can be simulated by U: for further 
details consult 0. Such a [/ is called a universal prefix computer and its 
choice is not unique. Using some particular universal prefix computer U as a. 
reference, the conditional Kolmogorov complexity of a given /3 is defined as 
Kuia\/3). 

The above definitions are generalized for the case of many strings as follows. 
We choose and fix a particular recursive bijection i? : X x X ^ X for use 
throughout the rest of this paper. Let {a*}"^]^ be a set of n strings a* G X. For 
2 < k < n we define (a\ a^, . . . , a'^) = B{{a^, . . . , a''"^), a^), and (a^) = a^. 
We can now define Ku{a\ . . . , a''\/3\ ...,/?'=) = Kui{a\ . . . , a") | (/3\ . . . , /3*^)). 

For any two universal prefix computers Ui and U2 we have, by definition, 
\Kif-^{a\P) — Ku^{a\(3)\ < k{Ui, U2) where k{Ui, U2) is a constant that depends 
only on Ui and U2 and not on a or 13. Most of the research on Kolmogorov com- 
plexity is focused on the asymptotic case of nearly random long strings, when 
n{Ui,U2) can be neglected in comparison to the value of the complexity. In 
such cases, Kolmogorov complexity becomes an asymptotically absolute mea- 
sure of the complexity of individual strings. For this reason, many fundamental 
properties of Kolmogorov complexity are established up to an error term which 
is asymptotically small compared to the complexity of strings involved. For in- 
stance, the standard analysis of the prefix Kolmogorov complexity (|^, Section 
3.9.2) gives 

Ku{a, 7|/3) = Ku{a\j, (3) + Kuh\(3) + A , (3) 

where A is an error term which grows logarithmically with the complexity of 
the considered strings. This is an example of an asymptotic property that 
all Kolmogorov measures of complexity have irrespective of the choice of the 
reference computer. Of course, it is important to know that all Kolmogorov 
measures of complexity share many of their asymptotic properties. For any 
given reference computer, however, Kolmogorov complexity is a well defined 
function on all binary strings. Even from a purely mathematical viewpoint it is 
interesting to study the properties of such functions beyond the asymptotics. 
As for the applied viewpoint, consider, by analogy, mathematical analysis. 
This theory would be much less useful if we studied only asymptotic properties 
of functions. 
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3 Main arguments 

Without significant knowledge about the reference computer, Kolmogorov 
complexity can be considered only up to an additive error term 0(1). Er- 
ror terms even as small as 0(1) make it impossible to use Occam's razor to 
discriminate between simple hypotheses. The importance of this problem be- 
comes apparent once we recognize that the domain of simple hypotheses is 
absolutely crucial in our every-day life as well as in fundamental science. In- 
deed, it is often the case that, after extensive analysis, the greatest scientific 
discoveries can be expressed in a form so simple that they are readily under- 
stood by even school children. 

Humans can relatively easily discriminate between different hypotheses even 
when the Kolmogorov complexities involved are rather small. This gives them 
an enormous advantage over the present-day theoretical models. A good exam- 
ple is Kepler's theory of planetary motion. In what was a major breakthrough 
in theoretical astronomy at the time, Kepler introduced elliptical orbits as a 
better alternative to the complicated Copernican planetary model of super- 
imposed epicycles. At the level of accuracy provided by Brahe's experiments, 
the original Copernican model had to be refined by introducing additional 
epicycles: the Keplerian theory appeared to be simpler and therefore better 
by Occam's razor. This apparently obvious fact cannot be established using 
the standard formalism of Kolmogorov complexity: whereas Kepler's theory 
can be simpler relative to some type of computers, the Copernican model can 
be simpler relative to some other type of reference computers. 

Much simpler examples can be found in tests that are designed by humans 
to test their own intelligence. A typical problem in such tests is to find the 
next element in a sequence of symbols. For example, if the first four elements 
of a sequence are 1,2,3,4 an intelligent person is supposed to see the simplest 
pattern and predict 5 as the next element of the sequence. As in the previ- 
ous example, all humans would agree that predicting 5 would correspond to 
the choice of the simplest hypothesis, whereas the standard formalism of Kol- 
mogorov complexity cannot be used to justify this. It seems entirely plausible 
that the ultimate theory of artificial intelligence and, in particular, inductive 
inference, can achieve human-like results only if the building blocks of the the- 
ory, such as Kolmogorov complexity, are made sensitive to small variations in 
the complexity of hypothesis. 

The 0(1) ambiguity in the classical definition of Kolmogorov complexity and 
the error terms like A in Eq. (|^) is the price we pay for having an unrestricted 
class of reference computers. Every human perceives complexity with respect 
to their own built-in reference computer - the brain. As in the case of abstract 
reference computers, human brains are not identical. However, they are similar 
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enough to allow for a sharper discrimination between individual theories on the 
basis of their complexity. This suggests that further progress in applications 
of Kolmogorov complexity to the theory of induction can be made possible if 
we find a natural way of restricting the class of reference computers. 

We see from this discussion that some restrictions on the class of reference 
computers are needed. It is desirable, however, to have a complexity theory 
which would be as general as possible. As a compromise, we can try to group 
all possible reference computers into restricted classes. Although we may want 
to study all such classes, we can argue that due to biological, technological, and 
other limitations only one class of reference computers is physically available 
to us. A definition of this realistic class of reference computers would be the 
crucial link between the abstract theory of Kolmogorov complexity and the 
practical theories of induction and computer learning. 

What kind of restriction of the class of reference computers can be seen as 
natural? It appears natural to assume that given some particular level of 
technology one can build more powerful computers only at the expense of 
a more complex internal design. In section ^ we use this observation to con- 
struct an example of a "natural" restriction of the class of reference computers. 
Roughly speaking, this restriction entails the requirement that switching to a 
more complex reference computer should always be accompanied by an equiv- 
alent reduction of program lengths. Using some particular universal computer 
[/ as a reference, we define the complexity of a computer Ws from the set {Wi} 
given data d as Ku{s\d). We then construct a particular set of computers {Wi} 
such that the sum of the complexity of a computer and the length of a program 
for it is the same for all equivalent^ programs and for all computers in the set 
{Wi} (consult section ^ for details). This gives us a tradeoff between computer 
complexity and program lengths similar to what one would expect in the real 
world where we face various practical limitations. Together with the original 
reference computer U computers {Wi} form a "naturally" restricted class. It 
is natural to define a computer W which is universal for this class by setting 
W{p, {s,d)) = Ws{p,d), where U is included by defining Wa = U. Using any 
such M^ as a reference we can see that, in principle, even error terms logarith- 
mic in complexity can be removed from the standard complexity calculus. In 
particular, we prove that for any triple of simple strings a, (], 7, we have 

Kw{an\{^, P)) = Kw{ab, P) + Kwh\{A, P)) + const , (4) 

where the constant depends only on the reference machine W (not on a, (3 
or 7). Apart from subtleties associated with the operation of combining strings 

^two programs pi and p2 for computers Ci and C2 are called equivalent iff Ci{pi\d) — 

C2{p2\d). 
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into pairs, this is analogous to Eq. (§) with the important difference that the 
error term is replaced by a constant. 

In the standard complexity calculus the above equation holds only up to an 
error term which grows logarithmically with the complexity of the considered 
strings. As we explained earlier, this is unacceptable if we want to analyze 
the complexity of simple strings. The error terms are especially troublesome if 
we want to use the complexity calculus as a part of inductive inference based 
on the MDL principle. In such cases we are interested in the position of the 
minimum rather than on the approximated value of complexity. The error term 
can significantly shift the position of the minimum even when mistakes on the 
value of complexity are minor. This can introduce uncontrollable mistakes in 
the inference results. In our case, however, equation (^) is exact in the sense 
that the constant does not influence the position of critical points so it can be 
safely ignored in applications such as induction by the MDL principle. 



4 Example 

As we explained in section |^, a natural restriction of the class of reference 
computers can make Kolmogorov complexity more useful in applications such 
as inference and computer learning. In this section we consider one possible 
way of making such a restriction. We show that, in the important case of 
simple strings, the proposed restriction effectively removes the error term in 
Eq. (^, which has important applications in physics |T^ . 



Definition 1 

Fix (5 G N. A set of strings S^ C X is called 6 -simple iff for any two strings 
a,7 G §5 we have 

\a\ < 6 , \j\ < 6 , and \{a,-y)\ < S , (5) 

where \ ■ \ denotes the string length. 

Following Chaitin [^, consider a list of infinitely many requirements (r^, lk{d)) 
(/c = 0, 1, 2, . . . ) for the construction of a computer. Each requirement (r^, lk{d)) 
requests that a program of length lk{d) be assigned to the result r^ if the com- 
puter is given data d. The requirements are said to satisfy the Kraft inequality 
if X^fe^"^'"^'^^ — -'-• ^°^ such requirements there exists an instantaneous code 
characterized by the set of string lengths {lk{d)}. A computer C is said to 
satisfy the requirements if there are precisely as many programs p of length 
l{d) such that C(p, d) = r as there are pairs (r, l{d)) in the list of requirements. 

Fix a universal computer U which can be constructed from an effectively given 
list of requirements (consult 0, Theorem 3.2). Consider the set of all programs 
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{pk} for U such that the output of computation U{pk, d) is defined. Since B is 
a bijection, we can write U{pk, d) = (r^, Sk), where r^ and Sk are strings from 
X. Moreover, because [/ is a universal computer, any pair of strings {a, 7) can 
be generated this way. In what fohows we consider only those pk for which 
Sk 7^ A. For every fixed s from the set {sk} we construct a list of requirements 

(rfc, \pk\ - Ku{s\d) + K^) , k=l,2,... (6) 

where \pk\ is the length of the program pk, and k^ is some constant. It was 
shown ([§], Theorem 3.8) that the constant k^ can be chosen large enough 
such that these requirements satisfy the Kraft inequality. Fix any (^ G N, and 
consider a sublist of requirements @: 

{rk,\pk\- Ku{s\d) + K^a) rk^deSs, (7) 

where S5 is the set of ^-simple strings. For any s G S5, we can find k = 
maxJK^I s,d E S5}, then choose k^ = k, and construct a new hst of require- 
ments 



{rk,\pk\- Ku{s\d) + k) rk,de 



is- 



For any fixed s G S5 these requirements satisfy the Kraft inequality by con- 
struction. Furthermore, since S5 is finite and B is recursive these requirements 
can be effectively given. This means that for any s G S5 there is a computer 
Ws that satisfies these requirements: consult (0, Theorem 3.2) for further 
details. 

For each value of s G S^ \ {A} we use (§) to construct one Ws. We define 
W\ = U, and form the set Wu = {Ws\ s G S5}. This set contains the original 
computer [/ as a somewhat special element. Having the computer U at our 
disposal, it would take at least Ku{s\d) bits to specify any other Wg from 
the set W[/ given data d. We can now see that requirements (||) are designed 
in such a way that more complex computers, i.e. larger Ku{s\d), will have 
shorter programs, lk{d) = \pk\ — Ku{s\d) + k. This is exactly the property 
that we wanted to use as a natural restriction that defines a realistic class of 
computers. 

In what follows we restrict our attention to the set Wu- We define a computer 
W which is universal for the set Wf/, i.e. which is designed to simulate any 
computer Wg G Wu'- 

W{p,{s,d)) = Wg{p,d). (9) 

Theorem 1 

For any a,d E E>s, and for any 7 G S^ \ {A}, we have 

Kw{a\^, d) = Kw{a, 7I (A, d)) - Kw{l\ {A,d)) + K. (10) 
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Proof 

Consider the program pk which causes Wg G Wu to produce the result r^ G S^ 
given data d 

WsiPk,d) = n. (11) 

By definition of Wg, the length oi pk satisfies the requirement 

yseSs\{A} andWeSs: \pk\ = \pk\ - Kuis\d) + k , (12) 

where pk is the program for U such that 

U{pk, d) = (rfc, Sk) , Sfc 7^ A . (13) 

We define the set K = {z| U{pi, d) = (r^, Sk)}, which can contain more than one 
element since some of the pairs {(r^, s^)} can coincide. From the construction 
of Wg we note that requirements (|) associate exactly one program pk with the 
corresponding program p^. In other words there is a one-to-one correspondence 
between programs pk and pk (which is given explicitly by the index k). This 
means that the set IK coincides with the set K = {i\ Ws{pi, d) = r^}. Since U, 
d and s are fixed, and using the identity K = K, we have from Eq. ([T2|) 

min \pk\ =mm \pk\- Ku{s\d) + K, s G §5 \ {A} . (14) 

By definition of W we have 

WiPk, {s, d)) = WsiPk, d)=rk, s=^A. (15) 

This means, by definition of Kolmogorov complexity, that Kw{rk\s,d) = 
min.g]^ \Pi\, s ^ A. Similarly from Eq. (|T^, we have Kuir^, Sk\d) = minjgK \pi\ 
and therefore Eq. (|14D becomes 

Kw{rk\s,d) = Ku{rk,Sk\d) - Ku{s\d) + k. (16) 

Because W{p, (A, d)) = U{p, d) we have, for instance, Ku{s\d) = K]y(s, (A, d)). 
Using this observation to transform both terms at the right hand side of 
Eq. (p!6D, and choosing s = s^ we have Eq. ([T0| ) as required. □ 

Note that, since U is an arbitrary prefix computer, the above analysis provides 
a grouping of all possible reference computers into naturally restricted classes. 
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